Columbia-Jadavpur submission for EMNLP 2016 Code-Switching Workshop Shared Task: System description
نویسندگان
چکیده
We describe our present system for language identification as a part of the EMNLP 2016 Shared Task. We were provided with the Spanish-English corpus composed of tweets. We have employed a predictor-corrector algorithm to accomplish the goals of this shared task and analyzed the results obtained.
منابع مشابه
The Howard University System Submission for the Shared Task in Language Identification in Spanish-English Codeswitching
This paper describes the Howard University system for the language identification shared task of the Second Workshop on Computational Approaches to Code Switching. Our system is based on prior work on SwahiliEnglish token-level language identification. Our system primarily uses character n-gram, prefix and suffix features, letter case and special character features along with previously existin...
متن کاملThe George Washington University System for the Code-Switching Workshop Shared Task 2016
We describe our work in the EMNLP 2016 second code-switching shared task; a generic language independent framework for linguistic code switch point detection (LCSPD). The system uses characters level 5-grams and word level unigram language models to train a conditional random fields (CRF) model for classifying input words into various languages. We participated in the Modern Standard Arabic (MS...
متن کاملSPMRL'13 Shared Task System: The CADIM Arabic Dependency Parser
We describe the submission from the Columbia Arabic & Dialect Modeling group (CADIM) for the Shared Task at the Fourth Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL’2013). We participate in the Arabic Dependency parsing task for predicted POS tags and features. Our system is based on Marton et al. (2013).
متن کاملThe Tel Aviv University System for the Code-Switching Workshop Shared Task
We describe our entry in the EMNLP 2014 code-switching shared task. Our system is based on a sequential classifier, trained on the shared training set using various characterand word-level features, some calculated using a large monolingual corpora. We participated in the Twitter-genre Spanish-English track, obtaining an accuracy of 0.868 when measured on the tweet level and 0.858 on the word l...
متن کاملAIDA: Identifying Code Switching in Informal Arabic Text
In this paper, we present the latest version of our system for identifying linguistic code switching in Arabic text. The system relies on Language Models and a tool for morphological analysis and disambiguation for Arabic to identify the class of each word in a given sentence. We evaluate the performance of our system on the test datasets of the shared task at the EMNLP workshop on Computationa...
متن کامل